FusedBatchNorm

对输入张量执行融合批归一化(Fused Batch Normalization),在多核间拆分批量单元并行完成归一化与仿射变换。

\[\hat{x}_{b,c} = \frac{x_{b,c} - mean_c}{\sqrt{variance_c + \epsilon}}, \quad y_{b,c} = scale_c \cdot \hat{x}_{b,c} + offset_c\]
输入:
  • input - 输入张量首地址,形状为 [unit, channel]

  • scale - 缩放系数数组首地址,长度为 channel

  • offset - 平移系数数组首地址,长度为 channel

  • mean - 归一化均值数组首地址,长度为 channel

  • variance - 归一化方差数组首地址,长度为 channel

  • epsilon - 数值稳定项。

  • channel - 通道数。

  • unit - 归一化单元数量(批量大小 × 高 × 宽)。

  • core_mask(int, 可选) - 核掩码(仅适用于共享存储版本)。

输出:
  • output - 写回融合批归一化计算结果的张量首地址。

支持平台:

FT78NE MT7004

备注

  • FT78NE 支持 fp32 数据类型。

  • MT7004 支持 fp16、fp32 数据类型。

共享存储版本:

void hp_fusedbatchnorm_s(const half *input, const half *scale, const half *offset, const half *mean, const half *variance, float epsilon, int channel, int unit, int core_mask, half *output)
void fp_fusedbatchnorm_s(const float *input, const float *scale, const float *offset, const float *mean, const float *variance, float epsilon, int channel, int unit, int core_mask, float *output)

C调用示例:

 1// FT78NE 多核示例
 2#include <stdio.h>
 3
 4int main(void) {
 5    const float *input = (const float *)0xA0000000;   // DDR 存储
 6    const float *scale = (const float *)0xB0000000;
 7    const float *offset = (const float *)0xB0001000;
 8    const float *mean = (const float *)0xB0002000;
 9    const float *variance = (const float *)0xB0003000;
10    float *output = (float *)0xC0000000;
11    int channel = 64;
12    int unit = 1024;
13    float epsilon = 1e-5f;
14    int core_mask = 0xff;
15    fp_fusedbatchnorm_s(input, scale, offset, mean, variance,
16                         epsilon, channel, unit, core_mask,
17                         output);
18    return 0;
19}

私有存储版本:

void hp_fusedbatchnorm_p(const half *input, const half *scale, const half *offset, const half *mean, const half *variance, float epsilon, int channel, int unit, half *output)
void fp_fusedbatchnorm_p(const float *input, const float *scale, const float *offset, const float *mean, const float *variance, float epsilon, int channel, int unit, float *output)

C调用示例:

 1// MT7004 单核示例
 2#include <stdio.h>
 3
 4int main(void) {
 5    const half *input = (const half *)0x10000000;     // L2 存储
 6    const half *scale = (const half *)0x10004000;
 7    const half *offset = (const half *)0x10008000;
 8    const half *mean = (const half *)0x1000C000;
 9    const half *variance = (const half *)0x10010000;
10    half *output = (half *)0x10014000;
11    int channel = 32;
12    int unit = 512;
13    float epsilon = 1e-4f;
14    hp_fusedbatchnorm_p(input, scale, offset, mean, variance,
15                        epsilon, channel, unit, output);
16    return 0;
17}